Reconfigurable Processing Units vs. Reconfigurable Interconnects

نویسندگان

  • Andreas Herkersdorf
  • Christopher Claus
  • Michael Meitinger
  • Rainer Ohlendorf
  • Thomas Wild
چکیده

In this paper we discuss different aspects of system reconfiguration and their relation to the specific requirements from the application domain. Two projects – one from the video processing the other one from the IP networking domain – are introduced that make different use of runtime reconfiguration by either changing individual processing units or the logical interconnect structure of the system. We demonstrate that the requirements of the particular application domain are decisive concerning the design quality and performance of the reconfigurable system architectures. With the availability of sophisticated FPGAs ([1], [2]) that allow reprogramming parts of the HW resources while the rest is operational, runtime reconfiguration has become an increasingly relevant research topic ([3], [4]). This approach enables to dynamically provide different functionalities comparable to SW-programmable architectures, however with HW-level performance. It allows a much broader adaptation of system architectures to varying processing requirements during system runtime, especially when resources are limited. Reconfiguration concepts mostly deal with the reconfiguration of processing units by reprogramming HW resources on FPGA platforms. However, as we will show with the example from the network processing domain, specific requirements prohibit processor unit type reconfiguration in this application, while it nevertheless can benefit from reconfiguration of logical interconnects. We are working on reconfigurable system architectures in the focus program “Reconfigurable Computing”, which is funded by the German Research Foundation ([5]). The Autovision project develops a processing architecture for an automotive driver assistance application ([6]). Key elements of the corresponding system architecture are different HW accelerator modules (coprocessors) that are optimized for the recognition of traffic participants in different driving environments. As these modules are not needed simultaneously, it is intended to exchange them during runtime of the system according to the particular driving situations. The HW reconfiguration is achieved by updating the internal configuration memory of the underlying FPGA platform. I.e. the physical circuit structure defined by CLBs, routing resources and BRAM content is changed. Compared to a solution where all coprocessors are implemented in parallel and are permanently available in the system architecture, the main focus of Autovision is to utilize partial dynamic reconfiguration to reduce the consumption of resources such as CLBs, BRAMs, etc. The available reconfiguration time is mainly determined by the frame rate of 25 fps and the maximum processing latency of the coprocessors if no unprocessed frames might be tolerable. Hence, timing requirements are in the range of several milliseconds in this application. The FlexPath project uses reconfigurability to increase network processor performance ([7]). In this case the number and physical interconnection of architecture resources (CPU cores, packet processing accelerators, memory blocks) are fixed, but the logical interconnect structure is modified at system runtime. Precisely, the processing paths, i.e. the sequence of architecture blocks to be traversed for the processing of packets that belong to specific traffic flows, are reconfigured. The reconfiguration actually consists in modifying memory contents of the rule base that is used for the Dagstuhl Seminar Proceedings 06141 Dynamically Reconfigurable Architectures http://drops.dagstuhl.de/opus/volltexte/2006/779 path decision. The objective is to guide the packets through the system with minimum resource usage of the internal communication infrastructure and especially of the embedded processors. Thus, higher performance than passing all packets by default through the processor cluster shall be achieved. In FlexPath, reconfiguration is heavily constrained by the packet interarrival time of tens to hundreds of nanoseconds (Gbps links, minimum sized packets) and the fact that packet losses are not acceptable. Looking at these very diverse approaches, it is necessary to explain in more detail what is actually meant when talking about runtime reconfiguration of systems. The basic commonality is that suchlike systems are kept fully operational during reconfiguration phases. However, fundamental differences exist concerning the abstraction level of the reconfiguration, the timing behavior, and the underlying HW platform. The abstraction level is mainly related with the resources that are involved in the reconfiguration. At one end of the spectrum, a system may be modified by altering the circuitry that determines functionality or connectivity on physical level. This type of reconfiguration is only possible on FPGA platforms that allow modification of the HW resources at runtime. In contrast to this low level reconfiguration, systems may also be reconfigured on a logical level. In this case the HW structure of the system is unchanged, but the usage of the resources is altered by modifying the rule base (mainly memory contents) that is used for guiding data through the system. In essence, in both variants memory contents are exchanged, however, with a fundamentally different impact: In one case the memory content determines the status of transistors and the functionality of logic gates, in the other case a fixed HW functionality on application level interprets the contents of the memory according to a certain convention, resulting e.g. in a modified usage of the system architecture. The latter type of reconfiguration is feasible on both FPGA and ASIC platforms. Timing requirements of the application are decisive for the applicability of a reconfiguration approach of the underlying architecture. The reconfiguration time, i.e. the time period needed for carrying out the reconfiguration process, is a key factor in this context, as it makes up the duration for which the reconfigured part of the system is not operational. In any case, it has to be guaranteed that the system processes the requests correctly and in a consistent way. Therefore, only idle times of modules may be used for their reconfiguration. Thus, the maximum processing time for a request plus reconfiguration time always have to be less than the interarrival time of requests. The reconfiguration time of parts of an FPGA is determined by the throughput of the programming interface and the size of the reconfigured area that in turn determines the amount of the associated programming information (partial bitstream) to be written into the configuration memory. Reconfiguration times around 1 – 100 milliseconds are common. However, there are also theoretical boundaries that determine the shortest reconfiguration time. Considering an input data width of 8 bit to access the configuration memory, the maximum theoretical throughput is 1 Byte per clock cycle. If the clock is set to 100 MHz, the maximum throughput is 100 kBytes per millisecond. Even tiny partial bitstreams (e.g. 10 kBytes) that can theoretically reconfigure a small fraction of the device would already consume 100 microseconds. These times are by far not acceptable in applications like the FlexPath network processor, where reconfiguration has to be completed in a time frame much shorter than reconfiguration times at the physical level. Therefore, an approach is followed that encompasses only reconfiguration of the logical interconnect and no reconfiguration on the physical level, neither of the functionality nor of the

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reconfigurable optical interconnects for distributed shared-memory systems

Reconfigurable optical interconnects can revive large-scale shared-memory processing.

متن کامل

High-speed (2.5 Gbps) reconfigurable inter-chip optical interconnects using opto-VLSI processors.

Reconfigurablele optical interconnects enable flexible and high-performance communication in multi-chip architectures to be arbitrarily adapted, leading to efficient parallel signal processing. The use of Opto-VLSI processors as beam steerers and multicasters for reconfigurable inter-chip optical interconnection is discussed. We demonstrate, as proof-of-concept, 2.5 Gbps reconfigurable optical ...

متن کامل

Performance Evaluation of Large Reconfigurable Interconnects for Multiprocessor Systems

Communication has always been a limiting factor in making efficient computing architectures with large processor counts. Reconfigurable interconnects can help in this respect, since they can adapt the interprocessor network to the changing communication requirements imposed by the running application. In this paper, we present a performance evaluation of these reconfigurable interconnection net...

متن کامل

Design of Reconfigurable Hardware Architectures for Real-time Applications

This thesis discusses modeling and implementation of reconfigurable hardware architectures for real-time applications. The target application in this work is digital holographic imaging, where visible images are to be reconstructed based on holographic recordings. The reconstruction process is computationally demanding and requires hardware acceleration to achieve real-time performance. Thus, t...

متن کامل

The Cameron Project: High-Level Programming of Image Processing Applications on Reconfigurable Computing Machines

Reconfigurable computing maps computation onto flexible and reprogrammable hardware. A typical reconfigurable computing (RC) system consists of a host processor (with a traditional architecture) and one or more reconfigurable coprocessors. Proposed hardware architectures for reconfigurable co-processors fall in two broad categories [4]: netlist computers with uniform arrays of fine grained logi...

متن کامل

Generation of Distributed Arithmetic Designs for Reconfigurable Application

We present a tool for design and implementation of reconfigurable computing applications based on the use of distributed arithmetic. Our tool provides the user the possibility to investigate different tradeoffs like area vs speed for his design. After simulation of the design, a synthesizable HDL code for a reconfigurable platform can be generated. Beside the existing fixed-point solutions for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006